{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Loading a Segmentation Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this first step, we will explore how to load **segmentation datasets** into FiftyOne. Segmentation datasets may be of two types: **semantic segmentation** (pixel-wise class labels) and **instance segmentation** (individual object masks). \n",
    "\n",
    "FiftyOne makes it easy to load both types using its Dataset Zoo or from custom formats like COCO or FiftyOne format. Let's start by loading a common format instance segmentation dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading a Common Format Segmentation Dataset\n",
    "Segmentation datasets are often provided in standard formats such as COCO, VOC, YOLO, KITTI, and FiftyOne format. FiftyOne supports direct ingestion of these datasets with just a few lines of code.\n",
    "\n",
    "Make sure your dataset follows the folder structure and file naming conventions required by the specific format (e.g., COCO JSON annotations or class mask folders for semantic segmentation)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install fiftyone huggingface_hub\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import fiftyone as fo\n",
    "\n",
    "# Create the dataset\n",
    "name = \"my-dataset\"\n",
    "dataset_dir = \"/path/to/segmentation-dataset\"\n",
    "\n",
    "# Create the dataset\n",
    "dataset = fo.Dataset.from_dir(\n",
    "    dataset_dir=dataset_dir,\n",
    "    dataset_type=fo.types.COCODetectionDataset, # Change with your type\n",
    "    name=name,\n",
    ")\n",
    "\n",
    "# View summary info about the dataset\n",
    "print(dataset)\n",
    "\n",
    "# Print the first few samples in the dataset\n",
    "print(dataset.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check out the docs for each format to find optional parameters you can pass for things like train/test split, subfolders, or label paths, check more in the User Guide of [Using Datasets](https://docs.voxel51.com/user_guide/using_datasets.html#dataset-media-type)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# FiftyOne with a Coffee-Beans Dataset\n",
    "We will walk through how to use [FiftyOne](https://voxel51.com/docs/fiftyone) to build better segmentation datasets and models. \n",
    "\n",
    "- Load your own dataset [into FiftyOne](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/index.html). For this example, we use a [Coffee-Beans Dataset](https://huggingface.co/datasets/pjramg/colombian_coffee) in COCO format.\n",
    "- Use FiftyOne [in a notebook](https://voxel51.com/docs/fiftyone/environments/index.html#notebooks)\n",
    "- Explore your segmentation dataset using [views](https://voxel51.com/docs/fiftyone/user_guide/using_views.html) and the [FiftyOne App](https://voxel51.com/docs/fiftyone/user_guide/app.html)\n",
    "\n",
    "*Note: To load the dataset locally, visit the Coffee-Beans Dataset page on Hugging Face, download the files, and then load them using the following command.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "bat"
    }
   },
   "outputs": [],
   "source": [
    "!git clone https://huggingface.co/datasets/pjramg/colombian_coffee"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you only see small pointer files instead of the actual images, it means Git LFS wasn’t used.\n",
    "In that case, use Git LFS to pull the full dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!sudo apt install git-lfs\n",
    "!git lfs install"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import fiftyone as fo\n",
    "\n",
    "dataset = fo.Dataset.from_dir(\n",
    "    dataset_type=fo.types.COCODetectionDataset,\n",
    "    dataset_dir=\"./colombian_coffee\",\n",
    "    data_path=\"images/default\",\n",
    "    labels_path=\"annotations/instances_default.json\",\n",
    "    label_types=\"segmentations\",\n",
    "    label_field=\"categories\",\n",
    "    name=\"coffee\",\n",
    "    include_id=True,\n",
    "    overwrite=True\n",
    ")\n",
    "\n",
    "# View summary info about the dataset\n",
    "print(dataset)\n",
    "\n",
    "# Print the first few samples in the dataset\n",
    "print(dataset.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see our images have loaded in the App, but no segmentation masks are shown yet. Next, we’ll ensure annotations are properly loaded."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "session = fo.launch_app(dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using the App\n",
    "\n",
    "With the FiftyOne App, you can visualize your samples and their segmentation masks in an interactive UI. Double-click any sample to enter the expanded view, where you can study individual samples with overlayed masks.\n",
    "\n",
    "The [view bar](https://voxel51.com/docs/fiftyone/user_guide/app.html#using-the-view-bar) lets you filter and search your dataset to analyze specific classes or objects.\n",
    "\n",
    "You can seamlessly move between Python and the App. For example, create a filtered view using the `Shuffle()` and `Limit()` stages in Python or directly in the App UI."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once your annotations are loaded correctly, you can confirm that your **segmentation masks** (not detections!) are present and visualized correctly. 🎉"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "session.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![coffe_beans](https://cdn.voxel51.com/getting_started_segmentation/notebook1/coffe_beans.webp)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "OSS310",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
